Reducing Parallel Overheads Through Dynamic Serialization
نویسندگان
چکیده
If parallelism can be successfully exploited in a program, signi cant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel overheads, the overall program performance can degrade. We propose a framework, based on an inspector-executor model, for identifying loops that are dominated by parallel overheads and dynamically serializing these loops. We implement this framework in the Polaris parallelizing compiler and evaluate two portable methods for classifying loops as pro table or unpro table. We show that for six benchmark programs from the Perfect Club and SPEC 95 suites, parallel program execution times can be improved by but as much as 85% on 16 processors of an
منابع مشابه
Non-data-communication Overheads in MPI: Analysis on Blue Gene/P
Modern HEC systems, such as Blue Gene/P, rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. This means that the local preand post-communication processing required by the MPI stack might not be very fast, owing to the slow processing cores. Similarly, small amounts of serialization within the MPI stack that were acceptabl...
متن کاملThe Importance of Non-Data-Communication Overheads in MPI
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single processing units. Instead, they rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. Using such low-frequency cores, how...
متن کاملMaking the Compilation “Pipeline” Explicit: Dynamic Compilation Using Trace Tree Serialization
Trace-based compilers operate by dynamically discovering loop headers and then recording and compiling all paths through a loop that are executed with sufficient frequency. The different paths through each loop form a tree, with the loop header at the root, in which common code is shared up-stream. Such trace-trees can be serialized in a specific manner that allows us to organize the compiler p...
متن کاملMetadata-Based Parallelization of Program Instrumentation
Program instrumentation has a wide variety of useful applications, but tool writers must overcome the challenge of substantial overheads caused by introducing additional code and data into a program. This paper observes that instrumentation usually operates on many discrete, independent data structures, which we callmetadata parallelism. We proposes to exploit this phenomenon to reduce the over...
متن کاملReducing overheads of dynamic scheduling on heterogeneous chips
In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration of both devices in the execution of parallel workloads. In this work, we focus on the problem of efficiently scheduling chunks of iterations of parallel loops among the co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999